Spatial Bias in PM2.5 Monitoring Networks in Los Angeles County
DSAN 6750: GIS for Spatial Data Science
Introduction
Los Angeles County’s PM2.5 monitoring network has always struck us as oddly distributed. It underpins exposure estimates, compliance checks, and policy conversations, but the coverage still feels lopsided once you look past downtown and the freeway belt. We picked L.A. because freeway exhaust is a defining part of daily life there; if the monitors lean toward road corridors, our exposure math ends up skewed. So we set out to see how evenly the network samples space, whether the instruments really hug the transportation grid, and how that pattern might warp our collective sense of air quality. Putting everything into a public-facing page lets agencies and residents see the same evidence we’re seeing, rather than a polished deck with tidy bullets.
Background & Research Questions
Going in, We really had two questions: (1) do the PM2.5 monitors pile up near major roads, and (2) even after accounting for that tendency, do the sites still show clustering beyond complete spatial randomness (CSR)? Those questions sound dry on paper, but they matter because biased siting throws off regional exposure estimates, exaggerates traffic pollution, and leaves quieter neighborhoods without data. Environmental justice concerns keep popping up in Los Angeles, so ignoring the spatial footprint of the monitoring program felt irresponsible. The study window covers all of Los Angeles County (10,598.78 km² in EPSG:3310 units) with 12 regulatory monitors that made it through quality control. TIGER/Line supplied the boundary and OpenStreetMap filled in the road network so we could line up monitors with the infrastructure shaping them.
Literature Review
Two threads in the literature pushed me toward this project. Environmental justice work keeps reminding us that communities of color shoulder disproportionate pollution near big roads, while the monitors that should document those loads show up late or never. At the same time, spatial statistics folks like Diggle (2013) warn that when we place monitors badly, every downstream model inherits those first- and second-order wobbles. Put differently, hugging freeways might catch the nastiest emissions, but it also risks missing inland valleys or the port, so the “average” picture looks cleaner than it feels. I leaned on those insights when choosing the regression and point-pattern tools below.
Methodology
I pulled daily PM2.5 values from EPA’s AQS (2016–2023) and joined the records to site metadata for land use, location type, and coordinates. TIGER/Line gave me the county boundary, OpenStreetMap provided the major-road network, and everything went into EPSG:3310 so distances mean something. After clipping to Los Angeles County, I averaged each station to a mean daily PM2.5 value and kept one record per regulatory site. The monitor points turned into a planar point pattern (monitor_ppp), the county polygon acted as the study window, and the road lines supplied the distance covariate. From there I walked through kernel intensity estimates, pair-correlation and L-function envelopes with 999 simulations, and finally an inhomogeneous Poisson regression to see whether intensity falls off with distance from big roads. Nothing fancy—just enough structure to keep the analysis reproducible (prepare_data.qmd, eda.qmd, and methods.qmd have the code).
Exploratory Data Analysis (EDA)
Once the points were on a map, the sparse nature of the network jumped out. The 12 regulatory monitors average 8.84 µg/m³ (sd = 3.42, roughly 1–13 µg/m³), and most of them sit in the San Gabriel Valley or South Bay. Antelope Valley and the industrial southeast barely register. The first map simply shows where the monitors live; the second uses color to highlight the PM2.5 gradient.
Overlaying major roads makes the bias hard to unsee. Stations hug the I-10/I-710 corridor and the South Bay freeway tangle, while the northern desert and the harbor edges stay blank. I reused the slide figure so the story stays consistent between the deck and this page.
Coloring each site by its distance to the nearest major road tells the same story—warm hues practically sit on the asphalt, so the network seems designed around traffic exposure.
The scatter between mean PM2.5 and road distance slopes downward, which made me pretty sure the regression would flag a significant distance effect.
A couple of other nuggets: the median monitor is only 0.37 km from a major road (ranging from 0.01 to 1.72 km); urban land-use tags dominate, so rural backdrops and coastal breezes almost never get sampled; and kernel-density plus LISA stats mark a high–high cluster south of downtown where elevated PM2.5 aligns with closely spaced monitors. That said, the small sample size keeps me cautious—one relocated site would change the visuals noticeably.
Spatial Modeling Visuals
The next trio of figures translates those gut feelings into formal diagnostics, all in EPSG:3310 meters:
The kernel smoother and the model-based intensity surface peak over the same freeway-heavy basin, so the distance covariate is pulling real weight rather than serving as a decorative map. Meanwhile, the observed L-function rides above the CSR envelope across a wide range of distances, which is a polite way of saying “yes, the sites cluster even after you account for the road trend.” Pairing these visuals with the distance maps convinced me the regression results weren’t just statistical noise.
Hypothesis Testing (Regression)
To keep myself honest, I fit an inhomogeneous Poisson point-process model with monitor intensity as a log-linear function of distance to the nearest major road:
[ (s) = _0 + _1 (s) ]
where ( (s) ) is the expected monitor density at location ( s ). The coefficients from methods.qmd landed at:
- ( _0 = -13.32 ) (SE = 0.019, p < 0.001)
- ( _1 = -0.00111 ) per meter (SE = 2.5e-5, p < 0.001)
That negative distance coefficient means the fitted intensity drops quickly as you move away from freeways—exactly what the maps hinted at. It’s a small number, but in meters it stacks up fast. Quadrat tests and Monte Carlo envelopes didn’t flag any glaring misfit, so the distance term seems to capture the dominant first-order structure. That said, with only a dozen monitors, some second-order clustering hangs around simply because there aren’t enough points to smooth things out.
Key Findings
Question 1: Are monitors more concentrated near major roads?
Pretty much. The regression’s distance term is negative, significant, and large enough that intensity plummets as soon as you leave a freeway corridor. The road overlays and distance maps tell the same story, so I feel comfortable saying the siting strategy chases traffic.
Question 2: Do monitors show clustering beyond CSR?
Also yes. The observed L-function rides above the CSR envelope for a wide span of distances, and the kernel intensity contours echo that ridge. Even after accounting for the distance effect, residual clustering hangs on, which tracks with how few sites we’re dealing with.
Implications.
Because of this bias, huge swaths of the county lack a local monitor while road-adjacent areas get sampled repeatedly. That setup probably overplays traffic emissions and underplays port or warehouse plumes. Any exposure assessment or compliance check using this network needs to correct for the sampling bias or augment the system with low-cost sensors in the neighborhoods that currently have no voice in the data.
Conclusion
Pulling together EPA PM2.5 data, TIGER boundaries, and OSM roads let me run a reproducible point-pattern deep dive on Los Angeles. Every diagnostic pointed in the same direction: twelve monitors cover more than ten thousand square kilometers, and most of them sit within a short walk of the same freeway spine. The inhomogeneous Poisson model puts numbers on that bias and the L-function shows clustering well beyond random placement, so this isn’t just a cartography quirk.
For planners, consultants, or businesses leaning on this network, that imbalance matters. It likely inflates traffic-related exposure while muting port or warehouse impacts, which means investments could drift toward already over-monitored corridors. A modest expansion(maybe a few regulatory relocations plus low-cost sensors in inland and harbor communities) could rebalance the coverage and give residents data that better reflects their lived air. Worth thinking about before the next compliance cycle arrives.